Wikipedia citations: A comprehensive data set of citations with identifiers extracted from English Wikipedia
نویسندگان
چکیده
Abstract Wikipedia’s content is based on reliable and published sources. To this date, relatively little known about what sources Wikipedia relies on, in part because extracting citations identifying cited challenging. close gap, we release Citations, a comprehensive data set of extracted from Wikipedia. We extracted29.3 million 6.1 English articles as May 2020, classified being books, journal articles, or Web content. were thus able to extract 4.0 scholarly publications with identifiers—including DOI, PMC, PMID, ISBN—and further equip an extra 261 thousand DOIs Crossref. As result, find that 6.7% cite at least one article associated cites just 2% all DOI currently indexed the Science. our code allow community extend upon work update future.
منابع مشابه
Scientific citations in Wikipedia
The Internet-based encyclopædia Wikipedia has grown to become one of the most visited web-sites on the Internet. However, critics have questioned the quality of entries, and an empirical study has shown Wikipedia to contain errors in a 2005 sample of science entries. Biased coverage and lack of sources are among the “Wikipedia risks”. The present work describes a simple assessment of these aspe...
متن کاملClustering of scientific citations in Wikipedia
The instances of templates in Wikipedia form an interesting data set of structured information. Here I focus on the cite journal template that is primarily used for citation to articles in scientific journals. These citations can be extracted and analyzed: Non-negative matrix factorization is performed on a (article × journal) matrix resulting in a soft clustering of Wikipedia articles and scie...
متن کاملWikipedia as a gateway to biomedical research: The relative distribution and use of citations in the English Wikipedia
Wikipedia is a gateway to knowledge. However, the extent to which this gateway ends at Wikipedia or continues via supporting citations is unknown. Wikipedia's gateway functionality has implications for information design and education, notably in medicine. This study aims to establish benchmarks for the relative distribution and referral (click) rate of citations-as indicated by presence of a D...
متن کاملLarge SMT data-sets extracted from Wikipedia
The article presents experiments on mining Wikipedia for extracting SMT useful sentence pairs in three language pairs. Each extracted sentence pair is associated with a cross-lingual lexical similarity score based on which, several evaluations have been conducted to estimate the similarity thresholds which allow the extraction of the most useful data for training three-language pairs SMT system...
متن کاملE-citations: actionable identifiers and scholarly referencing
This document discusses the role of "actionable" identifiers such as the Digital Object Identifier (DOI) in enabling scholarly citations in a digital environment. Citation is a sub-set of the general wider concept of linkage, but an interesting one for two reasons: it is a practical example being worked on today, and it demonstrates that linkage only between digital entities is insufficient for...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Quantitative science studies
سال: 2021
ISSN: ['2641-3337']
DOI: https://doi.org/10.1162/qss_a_00105